hierarchical nucleation
Hierarchical nucleation in deep neural networks
Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. Density peaks corresponding to single categories appear only close to the output and via a very sharp transition which resembles the nucleation process of a heterogeneous liquid. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories.
An unsupervised tour through the hidden pathways of deep neural networks
The goal of this thesis is to improve our understanding of the internal mechanisms by which deep artificial neural networks create meaningful representations and are able to generalize. We focus on the challenge of characterizing the semantic content of the hidden representations with unsupervised learning tools, partially developed by us and described in this thesis, which allow harnessing the low-dimensional structure of the data. Chapter 2. introduces Gride, a method that allows estimating the intrinsic dimension of the data as an explicit function of the scale without performing any decimation of the data set. Our approach is based on rigorous distributional results that enable the quantification of uncertainty of the estimates. Moreover, our method is simple and computationally efficient since it relies only on the distances among nearest data points. In Chapter 3, we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant to classification. In subsequent layers, density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. This process leaves a footprint in the probability density of the output layer, where the topography of the peaks allows reconstructing the semantic relationships of the categories. In Chapter 4, we study the problem of generalization in deep neural networks: adding parameters to a network that interpolates its training data will typically improve its generalization performance, at odds with the classical bias-variance trade-off. We show that wide neural networks learn redundant representations instead of overfitting to spurious correlation and that redundant neurons appear only if the network is regularized and the training error is zero.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Kansas (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (4 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.92)
- Health & Medicine > Therapeutic Area > Neurology (0.67)
- Education (0.63)
Review for NeurIPS paper: Hierarchical nucleation in deep neural networks
Weaknesses: The primary weaknesses are 1) lack of novelty, 2) concern as to whether the analysis method is advantageous and appropriate for understanding representation learning in CNNs, and 3) lack of convincing evidence that the four stated hypotheses are valid. Moreover, as argued in A.4, the method is also closely-related to CKA [3]. Thus the novelty comes primarily from the specific hypotheses raised by the authors and the methods used to test them. Note that this is not a substantial weakness, in that if the findings were interesting and the evidence was persuasive then acceptance would be merited. The need to pick a certain number of discrete neighbors seems disadvantageous.
Review for NeurIPS paper: Hierarchical nucleation in deep neural networks
All reviewers found the work compelling, and the analysis of probability densities and presented hypotheses broadly interesting to the community aiming to understand and visualize the internals of neural networks. In discussion, reviewers found the rebuttal convincing and the additional experiments beneficial to strengthening the hypotheses in the paper. For the camera-ready version, please include the additional figure and address any remaining minor concerns.
Hierarchical nucleation in deep neural networks
Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts.